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Abstract 

The estimation of high dimensional quantum states is an important statistical problem aris¬ 
ing in current quantum technology applications. A key example is the tomography of multiple 
ions states, employed in the validation of state preparation in ion trap experiments [1]. Since 
full tomography becomes unfeasible even for a small number of ions, there is a need to investi¬ 
gate lower dimensional statistical models which capture prior information about the state, and 
to devise estimation methods tailored to such models. In this paper we propose several new 
methods aimed at the efficient estimation of low rank states in multiple ions tomography. All 
methods consist in first computing the least squares estimator, followed by its truncation to an 
appropriately chosen smaller rank. The latter is done by setting eigenvalues below a certain 
“noise level” to zero, while keeping the rest unchanged, or normalising them appropriately. We 
show that (up to logarithmic factors in the space dimension) the mean square error of the re¬ 
sulting estimators scales as r ■ d/N where r is the rank, d — 2'° is the dimension of the Hilbert 
space, and N is the number of quantum samples. Furthermore we establish a lower bound for 
the asymptotic minimax risk which shows that the above scaling is optimal. The performance of 
the estimators is analysed in an extensive simulations study, with emphasis on the dependence 
on the state rank, and the number of measurement repetitions. We find that all estimators 
perform significantly better that the least squares, with the “physical estimator” (which is a 
bona fide density matrix) slightly outperforming the other estimators. 


1 Introduction 

Recent years have witnessed significant developments at the overlap between quantum theory and 
statistics: from new state estimation (or tomography) methods [H 0 HI [3 0 0 H] , design of ex¬ 
periments [9l uni E] , quantum process and detector tomography [121 US] construction of confidence 
regions (error bars) [HI HSl [16] , quantum tests [171 HH] entanglement estimation [T9| , asymptotic 
theory [iniisiiiiiiia. The importance of quantum state tomography, and the challenges raised by 
the estimation of high dimensional systems were highlighted by the landmark experiment [T] where 
entangled states of up to 8 ions were created and fully characterised. However, as full quantum 
state tomography of large systems becomes unfeasible [Mj , there is significant interest in identifying 
physically relevant, lower dimensional models, and in devising efficient model selection and estima¬ 
tion methods in such setups [551 HSl HZl 0 0 HE] ■ In this paper we reconsider the multiple ions 
tomography (MIT) problem by proposing and analysing several new methods for estimating low 
rank states in a statistically efficient way. Below, we briefly review the MIT setup, after which we 
proceed with presenting the key ideas and results of the paper. 

In MIT 0, the goal is to statistically reconstruct the joint state of k ions (modelled as two-level 
systems), from counts data generated by performing a large number of measurements on identically 
prepared systems. The unknown state p is a d x d density matrix (complex, positive trace-one 
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matrix) where d = 2^ is the dimension of the Hilbert space of k ions. The experimenter can measure 
an arbitrary Pauli observable a^yCTy or of each ion, simultaneously on all k ions. Thus, each 
measurement setting is labelled by a sequence s = (si,..., s^) € {x,y, out of 3^ possible choices. 
The measurement produces an outcome o = (oi,...,Ofc) S {+1,-1}^, whose probability is equal 
to the corresponding diagonal element of p with respect to the orthonormal basis determined by 
the measurement setting s. The measurement procedure and statistical model can be summarised 
as follows. For each setting s the experimenter performs n repeated measurements and collects the 
counts of different outcomes A^(o|s), so that the total number of quantum samples used is N := nx3^. 
The resulting dataset is a 2^ x 3^ table whose columns are independent and contain all the counts 
in a given setting. A commonly used [1] estimation method is maximum likelihood which selects the 
state for which the probability of the observed data is the highest among all states. However, while 
this method seems to perform well in practice, and has efficient numerical implementations |29| . it 
does not provide confidence intervals (error bars) for the estimators, and it has been criticised for 
its tendency to produce rank-deficient states [2]. 

The goal of this paper is to find alternative estimators which can be efficiently computed, and work 
well for low rank states. The reason for focusing on low rank states is that they form a realistic 
model for physical states created in the lab, where experimentalists often aim at preparing a pure 
(rank-one) state. While this is generally difficult, the realised states tend to have rapidly decaying 
eigenvalues, so that they can be well approximated by low rank states. Our strategy is to combine 
an easy but “noisy” estimation method - the least square estimator (LSE) - with an appropriate 
spectral truncation, tuned using available data only, which sets spurious eigenvalues to zero and 
allows to reduce the mean square error of the estimator. 

The LSE is obtained by inverting the linear map A : p i—>■ Pp between the state and the 
probability distribution of the data, where the unknown probabilities are replaced by the empir¬ 
ical (observed) frequencies of the measurement data. The resulting estimator is unbiased, and is 
“optimal” in the sense that it minimises the prediction error, i.e. the euclidian distance between 
the empirical frequencies and the predicted probabilities. However, one of the disadvantages of the 
LSE is that it does not take into account the physical properties of the state, i.e. its positivity and 
trace-one property. More importantly, as we explain below, the LSE has a relatively large estima¬ 
tion error for the class of low rank states, and performs well only on very mixed states. This is 
illustrated in Figure]^ where the eigenvalues of are plotted (in decreasing order) against those 
of the true state p, the latter being chosen to have rank r — 2. We see that while the non-zero 
eigenvalues of p are estimated reasonably well, the LSE is poor in estimating the zero-eigenvalues, 
and as consequence, it has a large estimation variance. 

Our goal is to design more precise estimators, which have the LSE as a starting point, but take 
into account the “sparsity” properties of the unknown state. Figure suggests that the non-zero 
eigenvalues of the LSE which are below a certain “statistical threshold”, can be considered as 
statistical noise and may be set to zero in order to improve the estimation error. To find this noise 
level, we establish a concentration inequality (see Proposition]^ which shows that the operator-norm 
error is upper bounded by a rate which (up to logarithmic factors in d) is proportional 

to d/N. 

The first estimator we propose, is a rank penalised one obtained by diagonalising the LSE, arranging 
its eigenvalues in decreasing order of their absolute values, and setting to zero all those eigenvalues 
whose absolute values are below the threshold u 

d 

The same outcome can be obtained as solution of the following penalised estimation problem: among 
all selfadjoint matrices, choose the one that is close to the LSE but in the same time it has low rank. 
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Figure 1: Eigenvalues of the LSE (red) arranged in decreasing order, versus those of the true state 
of A: = 4 ions of rank r = 2 (blue), for n = 20 measurement repetitions (LEET) and n = 100 
measurement repetitions (RIGHT). 


so that it minimises over r the norm-two square discrepancy penalised by the rank 

In particular the estimator’s rank is determined by the data. In Theorem we show that if p is of 
unknown rank r < d, then the mean square error (MSE) — p||| is upper-bounded (up to 

logarithmic factors) by the rate (r • d)/N. This captures the expected optimal dependence on the 
number of parameters for a state of rank r. Indeed, in section we show that no estimator can 
improve the above rate for all states of rank r, cf. Theorem for the asymptotic minimax lower 
bound. 

The penalised estimator has however the drawback that it may not represent a physical state. To 
remedy this, and further improve its statistical accuracy, we propose a physical estimator which is 
the solution of the following optimisation problem. We seek the density matrix which is closest to 
the LSE and whose non-zero eigenvalues are larger that the threshold 4i/. It turns out that the 
solution can be found via a simple iterative algorithm whereby at each step the eigevalues of 
below the threshold are set to zero, and the remaining eigenvalues are normalised by shifting with 
a common constant, while the eigenvectors are not changed throughout the process. In Theorem 
we show that the physical estimator satisfies a similar upper bound to the penalised one. 

In section]^ we present results of extensive numerical investigations of the two proposed estimators. 
In addition we consider the oracle “estimator”, which is simply the spectral truncation of the LSE 
that is closest to the true state p, and the cross-validated estimator which aims at finding 
the optimal truncation rank by estimating the Erobenius error by cross-validation. In fact, we 
found that cross-validation can help in better tuning the constant factor of the threshold rate of 
the penalised and physical estimators. As expected from the theoretical results, we find that all 
estimators perform significantly better than the LSE on low rank states; moreover the physical 
estimator has slightly smaller estimation error than the others, including the oracle estimator. We 
also find that all methods converge to the correct rank in the limit of large number of repetitions 
but through different routes: the penalised estimator tends to underestimate, while the physical one 
tends to overestimate the rank, for small number of samples. 

Having discussed the upper bounds on the estimators’ MSE, we would like to know how they 
compare with the best possible estimation procedure. One way to characterise the latter is through 
the asymptotic minimax risk for the class of states of a given rank r. From asymptotic statistics 
theory m we know that for every sequence of estimators Pn, the following lower bound for its 
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asymptotic maximum risk over the set Sd,r of state of rank r holds 

lim inf n • 3^ sup E|lp„ —p|| 2 > sup Tr(/(p)“^G'(p)). 

On the right side, I{p) is the Fisher information corresponding to all measurement settings taken 
together, and G{p) is a positive matrix describing the quadratic approximation of the Frobenius 
distance around p. In Theorem]^ we show that the right side is lower bounded by 2r{d — r) which 
shows that (up to logarithmic factors) the upper bounds of the penalised and physical estimators 
have the same scaling as the asymptotic minimax risk. 

Recently, a number of papers discussed related aspects of quantum tomography problems. The idea 
of the penalised estimator has been proposed in [S], which provided a weaker upper bound for its 
MSE. Reference |7] analyses model selection methods for finite rank models and maximum likelihood 
estimation. Reference m proposes a different estimator and establishes a comparable upper bound 
for its MSE. The class of low rank states is also employed in compressed sensing quantum tomography 
[25l |32l [28] , but their statistical model is based on expectations of Pauli observables rather than 
measurement counts. 

The paper is organised as follows. In sectionj^we describe the measurement procedure and introduce 
the statistical model of MIT. In section|^we define the linear (least squares) estimator and derive an 
upper bound on its operator norm error which improves on a previous bound of [5]. In section we 
define the penalised and threshold estimators and derive upper bounds for their mean square errors 
with respect to the norm-two square (Frobenius) distance. The performance of the different methods 
is analysed in Section An asymptotic lower bound for the minimax risk is derived in section 
based on the Fisher information of the measurement data. The upper and lower bounds match in 
the scaling with the number of parameters and number of total measurements, up to a logarithmic 
factor. We give a detailed description of the numerical implementation of the algorithms, including 
the cross-validation routines used for tuning the pre-factor of the penalty and threshold constants. 
We illustrate the simulation results with box plots of the Frobenius errors for the least squares, 
oracle, cross-validation, penalisation and threshold estimator, for states of ranks 1,2,6 and 10, and 
for different choices of measurement repetitions n = 20,100. Additionally, we plot the empirical 
distribution of the chosen rank for different estimators, showing the concentration on the true rank 
as the number of repetitions increases. 


2 Multiple ions tomography 

This paper deals with the problem of estimating the joint quantum state of k two-dimensional 
systems (qubits), as encountered in ion trap quantum tomography [T] . The two-dimensional system 
is determined by two energy levels of an ion, while the remaining levels can be ignored as they 
remain unpopulated during the experiment. The joint Hilbert space of the ions is therefore the 
tensor product where d = 2^, and the state is a density matrix p on this space, i.e. a 

positive d X d matrix of trace one. 

Our statistical model is derived from standard ion trap measurement procedures, and takes into 
account the specific statistical uncertainty due to finite number of measurement repetitions. We 
consider that for each individual qubit, the experimenter can measure one of the three Pauli observ¬ 
ables ax, ay, a^. A measurement set-up is then defined by a setting s = (si,..., Sk) € Sk ■= {x, y, z}^ 
which specifies which of the 3 Pauli observables is measured for each ion. For each fixed setting, the 
measurement produces random outcomes o G Ok '■= {+1, —1}^ with probability 

p(o|s) := Tr(pP^) = (e^|p|e^), (2.1) 

where P® are the one-dimensional projections 

= ( 2 - 2 ) 
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and |e®) are the eigenvectors of the Pauli matrices, i.e. crs|e±) = ± |e^). 

The measurement procedure consists of choosing a setting s, and performing n repeated measure¬ 
ments in that setting, on identically prepared systems in state p. This provides information about 
the diagonal elements of p with respect to the chosen measurement basis, i.e. the probabilities p(o|s). 
In order to identify the other elements, the procedure is then repeated for all 3^ possible settings. 


Before describing the statistical model of the measurement counts data, we start by discussing in 
more detail the relation between the unknown parameter p and the probabilities p(o|s). Consider the 
“extended” set of Pauli operators {crx, CTy, cr^, cr/ := 1} which form a basis in We construct 

the tensor product basis in with elements an = ® • • • 0 ab^ where b G {x,y,z,I}^ and 

note that the following orthogonality relations hold Tr(crbO’c) = ddb.c- The state p can be expanded 
in this basis as 

p= ^ pbCTb, where pb = Tr(p(Tb)/d. (2.3) 


Equation (2.11 can then be written as 


P(o|s) = ^ pbTr(crbPJ) = ^ pbA(o|s). 

h£{I ,x,y,z}^ h£{I ,x,y^z}^ 


The coefficients Ab(o|s) can be computed explicitly as 

Ab(o|s) = TiVbPo^) = n (2.4) 


A k 

where Pb := {* : h = !}■ Let p G be the representation of p as a the vector of coefficients 
Pb, and let p be the corresponding vector of probabilities for all settings (p(o|s) : (o, s) G Ofc x Sk), 
with settings, and outcomes within settings ordered in lexicographical order. The measurement is 
then described by the linear map A : —>■ C) with matrix elements Ab(o|s) defined in 

such that 

(2.5) 

A)~^ • A* • p, which 


(2.4) 


Ap. 


The linear map A is injective and its inverse can be computed as p = (A* 
together with the decomposition (2.3) and Lemma implies 


^ = E E ^ Pb^Tb, 

bos b 


where d{h) := iPbl- 


( 2 . 6 ) 


The above formula allows to reconstruct the matrix elements from the measurement probabilities. 
However, since the experiment only provides random counts from these probabilities, we need to 
construct a statistical model for the measurement data. After n repetitions of the measurement with 
setting s, we collect independent, identically distributed observations € Ofc, for i from 1 to n. 
The data can be summarised by the set of counts {A(o|s) : o G Ok}, where A(o|s) = = o) 

is the number of times that outcome o has occurred. After repeating this for each setting s G Sk, 
we collect all the data in the counts dataset D := {A(o|s) : (o,s) G Ok x Sk}- Since successive 
preparation-measurement cycles are independent of each other, the probability of a certain dataset 
D is given by the product of multinomials 

FpiD) = P, ({A(o|s) : (o,s) G x Vk}) = H n^(o|s)! (2-7) 

The statistical problem is to estimate the state p from the measurement data summarised by the 
counts dataset D. The most commonly used estimation method is maximum likelihood (ML). The 
ML estimator is defined by 

^pnO(£)) a,rgmaxPo.(D) 
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where the maximum is taken over all density matrices t on and can be computed by using 
standard maximisation routines, or the iterative algorithms proposed in [291 I33j . However, ML 
becomes impractical for about fc = 10 ions, and the iterative algorithm has the drawback that 
it cannot be adapted to models where prior information about the state is encoded in a lower 
dimensional parametrisation of the relevant density matrices, e.g. when the states are low rank. 
In the next section we discuss an alternative method, the least square estimator, and derive an 
upper bound on its mean square error. After this, we will show that by “post-processing” the least 
squares estimator using penalisation and thresholding methods, its performance can be considerably 
improved when the unknown state has low rank. 


3 The linear (least squares) estimator 


Since the vectorise version p of the state p satisfies (2.5), it is the solution of the optimisation 


p = arg inf ||p-Aff, 
fec*'‘ 


giving p = pb • CTb in (2.6). If the number of repetitions n is large compared with the dimension d, 
then the outcomes’ empirical frequencies are good approximations of the corresponding probabilities, 
i.e. /(o|s) := N{o\s)/n p(o|s) by the law of large numbers. Therefore, by replacing p by the 
vector of frequencies f in in the previous display, we can define the least square estimator of p 


pl'^):=arg inf ||f - Af 


which has the explicit expression 


rn 


bos 


^b(o|s) 

2 fe 3 <i(b) 


(3.1) 


Note that in this case it comes down to replacing the unknown probability p in equation (2.6) with 
the empirical frequencies f (also known as the plug-in method). 


In spite of this “optimality” property and its computationally efficiency, the least square estimator 
has the disadvantage that in general it is not a state, i.e. it is not trace-one and may have negative 
eigenvalues. A more serious disadvantage is that its risk - measured for instance by the mean square 
error E(||p^**^ — pUf) - is large compared with other estimators such as the maximum likelihood 
estimator. This is due to the fact that the linear estimator does not use the physical properties 
of the unknown parameter p, that is positivity and trace-one. As we will see below, the modified 
estimators proposed in Section outperform the least square while adding only a small amount of 
computational complexity. Moreover, the second estimator will be a density matrix. 


In the remainder of this section we provide concentration bounds on the square error of the linear 
estimator, which will later be used in obtaining the upper bounds of the improved estimators. The 
following Proposition improves the rate A:(4/3)^/n obtained in [5] to A:(2/3)*’/n. 

Proposition 1. Let be the linear estimator of p. Then, for any e > 0 small enough the 
following operator norm inequality holds with probability larger than 1 — e under Pp 




where 






with N := n • 3^ the total number of measurements. The same bound holds when k = k{n) as long 
as ^{e) 0. 
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Proof. See Appendix |8.2[ 

As a side remark we note that projecting the least-squares estimator onto the space of Hermitian 
matrices with trace 1, does not change the rate of convergence from Proposition The following 
Proposition allows us to assume that, without loss of generality, the least-squares estimator has also 
trace 1. 


Proposition 2. Under the notation and assumptions in Proposition^ let 


p<ls,n) 

rn 


= argmin||r-^„'^)||i. 

T:tr{r) = l 


(3.2) 


Then with probability larger than 1 


e we have 


P\\ < 2j^(£)- 


Proof. See Appendix |8.3[ 


4 Rank-penalised and threshold projection estimator 

In this section we investigate two ways to improve the least-squares estimators. The first method 
is to project the least-squares estimator onto the space of finite rank Hermitian matrices of an 
appropriate rank. We prove upper bounds for its risk with respect to the Frobenius (norm-two) 
distance. Building on the knowledge about the rank-penalised estimator, we define the second 
estimator which is the projection of the least-squares estimator on the space of physical states 
whose eigenvalues are larger than a certain positive noise threshold. We give an simple and fast 
algorithm producing a proper density matrix from the data, which also inherits the good theoretical 
properties of the rank-penalized estimator. 


4.1 Rank-penalised estimator 

We introduce here the rank-penalised nonlinear estimator, which can be computed from the least- 
squares estimator by truncation to an appropriately chosen rank. 

As noted earlier, while the least-square estimator is unbiased, it has a large variance due to the fact 
that it does not take into account the physical constraints encoded in the unknown parameter p. 
A possible remedy is to “project” the least squares estimator onto the space of physical states, i.e. 
positive, trace-one matrices. This method will be discussed in the following subsection. Another 
improvement can be obtained by taking into account the “sparsity” properties of the unknown state. 
For instance, in many experimental situations the goal is to create a particular low rank, or even 
pure state. The fact that such states can be characterised with a smaller number of parameters 
than a general density matrix, has two important consequences. Firstly, they can be estimated by 
measuring an “informationally incomplete” set of observables, as demonstrated in [251l26j . Secondly, 
the prior information can be used to design estimators with reduced estimation error compared with 
generic methods which do not take into account the structure of the state. Roughly speaking, this is 
because each unknown parameter brings its own contribution to the overall error of the estimator. 

However, the downside of working with a lower dimensional model is that it contains built-in as¬ 
sumptions which may not be satisfied by the true (unknown) physical state. Preparing a pure state 
is strictly speaking rarely achievable due to various experimental imperfections, so using a pure state 
statistical model is in fact an oversimplification and can lead to erroneous conclusions about the true 
state. On the other hand, one can argue that when the (small) experimental noises are taken into 
account, the actual state is “effectively” low rank, i.e. it has a small number of significant eigenval¬ 
ues and a large number of eigenvalues which are so close to zero that they cannot be distinguished 
from it. Then, the interesting question is how to decide on where to make the cut-off between 
statistically relevant eigenvalues and pure statistical noise. This is a common problem in statistics 
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Figure 2: The norm-two square error ||pn(^) ~ plli of truncated LSE as a function of rank, for 
a rank 6 state, k = 4 (d = 16) and n = 500 measurement repetitions. 


which is closely related to that of model selection [53]. Below we describe the rank-penalised esti¬ 
mator addressing this problem, and show that its theoretical and practical performance is superior 
to the least squares estimator, and is close to what one would expect from an optimal estimator. In 
addition, its computation requires only the diagonalisation of the least-squares estimator. 


Before presenting a simple algorithm for computing the estimator, we briefly discuss the idea behind 
its definition. Let 

d 




(4.1) 


be the spectral decomposition of the least squares estimator, with eigenvalues ordered such that 
|Ai| > ... > lAj/j. For each given rank k G {1,... ,d} we can project onto the space of matrices 
of rank k by computing the matrix which is the closest to with respect to the Frobenius distance 


P„(k) := argmin 

r; rank(T)=/si 




Although the projection is not a linear operator, Pnin) is easy to compute, and is obtained by 
truncating the spectral decomposition (4.11 to the most significant k eigenvalues 


Pn{K) = 

2=1 


The question is now how to choose the rank k in order to obtain a good estimator. In Figure 
we illustrate the dependence of the norm-two square error e(«;) := \\pn{K) — p|jf on the rank, for a 
particular dataset generated with a rank 6, 4-ions state. As the rank is increased starting with k = I 
(pure states), the error decreases steeply as p„(k) becomes less biased, it reaches a minimum close 
to the true rank, and increases slowly as added parameters increase the variance of the estimator. 
However, since the state p is unknown, the norm-two error and optimal rank for which the minimum 
is achieved, are unknown. To go around this, we can estimate the error e^n) from the data by means 
of e.g. cross-validation, as it will be described in section However, in this section we follow a 
different path, and we define the rank-penalised estimator [5], |5S| as the minimiser over k of the 
following expression: 

d 

\\Pn{l^) + ■ l^= + 

where v is & constant which will be tuned appropriately. The first term quantifies the fit of the 
truncated estimator with respect to the least squares estimator, while the second term is a penalty 




which increases with the complexity of the model, i.e. the rank. The rank penalised estimator 
is thus the solution of the simple optimisation problem 


Wpen) 

rn 


Pnik), 


where 


K := argmm 


" d 

E 


\x^ 


■ K 


max{K : > i^^}(4.2) 


This means that the eigenvalues below a certain noise threshold are set to zero while those above 
the threshold remain unchanged. The following theorem is our first main result, and shows that the 
appropriate threshold is given by the upper bound on the operator norm error of the least squares 
estimator, as established in Proposition 

Theorem 1. Let 6 > 0 be an arbitrary constant, let c{9) := 1 + 2/9, and let e > 0 be a small 
parameter. Then with probability larger than \ — e, we have 


l^pen) _ p|j2 < ^ C^{9) A|(p) + 2c{9)v{£Yk 

j>K 


where is the penalised estimator defined in (4.2) with threshold v{£)^ given by 




2 /2 


n \ 3 


log f 


/2fc+i 


2d, 2d 
N £ 


which is assumed to be o(l) with increasing n and k. 


(4.3) 


Proof. The upper bound follows directly from Proposition combined with the following oracle 
inequality established in [5], 


-)-p||^< min . 


c^{9) min \\R — p \\2 + 2c{9) ■ k ■ i/{£) 

R:Tank{R)=K 


which holds true provided that + This event occurs with probability larger 

than 1 — £. □ 


Let us make some explanatory remarks on the above result. Firstly, the bound (4.3) applies to all 
states p, not only “small” rank ones. Recall that Sd,r denotes the set of states of rank-r states on 
C"^. In the special case when rank(p) = r < d, the theorem implies that, with probability larger 
than 1 — e, 

If the rank r is much smaller than d, this bound is a significant improvement to the corresponding 
upper bound d ■ v(£)'^ for the least square estimator, which can be derived from the operator norm 
bound of Proposition Moreover, up to a constant factor the rate is equal to d ■ rlog(d)/N 

which is essentially the ratio of the number of parameters and total number of measurements. In 
sectionj^we will show that apart from the log factor this rate is also optimal, and cannot be improved 
even if the rank of the state is known, which indicates that the estimator adapts to the complexity 


of the true parameter. Furthermore, we stress the fact that the bound (4.3) holds true for growing 
dimension d = 2^ as well as the number of measurements n; the bound remains meaningful as far 
as d\ogd/N —^ 0. 

The second observation is that our procedure selects the true rank consistently. Denote by k the 
rank of the resulting estimator Following [S] we can prove that, if there exists some k such 

that Ak(p) > (1 + 5)\/j^(e) and Ak+i(p) < (1 — <5) \/v{£) for some (5 S (0,1), then 


= k) = 1 — 


- p\\ > 
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This stresses the fact that the procedure detects the eigenvalues above a threshold related to the 
error of the least squares estimator. If the true rank of p is r and if k and n are such that ly tends to 
0 (which always occurs for fixed number of ions k), then > v asymptotically and the probability 
that k = r tends to 1. 

We can also project on the matrices with trace 1, to get 


•^pen,n) 

Pn 


arff mm 
RGS 




2? 


(4.4) 


where S is the set of all density matrices on C^. The following Corollary shows that the key properties 
of the estimator are preserved if we additionally normalise it to trace-one after thresholding. 

Corollary 1. Under the notation and assumptions of Theorem^ if p is an arbitrary state in Sd,r 
and if k and n are such that Xr > i'{e) for some e G (0,1), then, with probability larger than 1 — e, 


Il^r”’”^-Pll^<8c(0).r-Ke)^. 


Moreover, there exists an absolute constant C > 0 such that 

sup i?p||pl~)-p||^<C^logf?^). 


Proof. 


See Appendix 



□ 


4.2 Physical threshold estimator 


Although the rank-penalised estimator performs well in terms of its risk, it is not necessarily positive 
and trace-one and therefore it may not represent a physical state. In this section we propose and 
analyse the following “physical estimator” 


pfphys) 

rn 


= arg min ||a-p<(^)||2, 
creSU) 


(4.5) 


where pl^*^ is the ‘“normalised least squares estimator” defined in (3.2), and S{iy) denotes the set of 
states at noise level n 


S(iy) = {cr density matrix with eigenvalues G {0} U {An, 1], j = 1,..., d} . 


In particular, the space of all density matrices correspond to v = 0 and is denoted S. The estimator 
jg therefore the physical state which is closest to the (normalised) least square estimator, and 
whose non-zero eigenvalues are above the threshold An. 

Before analysing the performance of the estimator, we describe its numerical implementation through 
the following simple iterative algorithm. 

Let Ai > • • • > Ad denote the eigenvalues of pn*^ Let £ = 0, and define xf'^ = Xj for j = 1,... ,d. 
For £ = I ,..., d, do 
if > 4i., STOP; 

else, put A^^2 ^+i = 0 and 




d-e 


d-i 




k=l 



for j = l,...,d-£; 


£ = £ + 1. 
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The algorithm checks whether the smallest eigenvalue is larger than the noise level 4z/ and if it is 
not, then sets its value to 0 and distributes the mass of the erased eigenvalue in such a way that 
they sum to 1. This algorithm is similar to that proposed by Smolin et al. [1], with the important 
difference that we do not keep all positive eigenvalues but only significantly positive eigenvalues. 
Here, significant means larger than the noise threshold of the order of the operator-norm error of 
the least-squares estimator. Indeed, this noise can give a confidence interval for each eigenvalue. 

If the total number of iterations is i = d — r then the estimator has rank r. Its eigenvalues 

are equal to 0 for j > r, while, for j <r they are given by 

^(phys) where ^ Aj, = 1 - ^ A^,. 

fc>r fc<r 

This implies that has decreasing eigenvalues and > Au. The following Theorem shows 

that is rank-consistent and its MSE has the same scaling as that of penalised estimator 

Theorem 2. Assume that the state p has rank r, i.e. belongs to Sd,r- For small e > 0, let v = i'{e) 
he defined as in Theorem^^ and assume that Xr > 8i'{e). Then, with Pp probability larger than 1—e 
we have r = r and 

Moreover, there exists an absolute constant C > 0 such that 

sup -pg<c^-^ log (^). 


Proof. See Appendix |8.5[ 


5 Lower bounds for rank-constrained estimation 

The goal of this section is to investigate how the convergence rates of our estimators compare with 
that of an “optimal estimator” for the statistical model consisting of all states of rank up to r. For 
this we will derive a lower bound for the maximum risk of any estimator. 

In this section p„ will be an arbitrary estimator and the true state p is assumed to belong to the set 
Sd,r of rank-r states. To quantify the overall performance of pn, we define the maximum risk 

Fjnaxi^Pn ^— SUp Epljp^^ P||2’ 
peiSd.r 

In view of the previous upper bounds, we expect its asymptotic behaviour (in terms of the total 
number of measurements, for a large number of repetitions n) to be 

Rmax{pn]r) = ^ ■ 0(1), N = n ■ 8^. 

Taking this into account we define the (appropriately rescaled) minimax risk as 

Rminraax{Tik,n^ .— illf (^■^) 

Pn 

which describes the behaviour of the best estimator at the hardest to estimate state. The next 
theorem provides an asymptotic lower bound for the minimax risk. It shows that the maximum 
MSE of any estimator is al least of the order of r{d — r)/N, which for low rank states scales as 
#parameters/#samples, which up to logarithmic factors is the same as the upper bounds derived 
in Theorems Corollary and Theorem 
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Theorem 3. The following lower bound holds for the asymptotic minimax risk holds 

liuiinf Rminmax{r,k;n) > 2r{d-r). 

n—¥OQ 


Proof. The minimax risk captures the worst asymptotic behaviour of the rescaled risk, over all states 
of rank r. In order to bound the risk from below, we construct a (lower dimensional) subfamily of 
states TZd,r C Sd,r such that the maximum risk for this subfamily provides the lower bound. Let 

po :=Diag ^^,...,^,0, (5.2) 

be a diagonal state with respect to the standard basis, and define TZd,r to be the set of matrices 
obtained by rotating po with an arbitrary unitary U, i.e. TZd.r '■= {p ■= UpoU* \ U unitary}. This 
is a smooth, compact manifold of dimension 2r{d — r) known as a (complex) Grasmannian []. At 
each point p = UpoU* we consider the ONB By obtained by rotating the standard ONB B by the 
unitary U. With respect to this basis, we consider first the parametrisation of an arbitrary density 
matrix p' by its matrix elements, more precisely by the diagonal, real and imaginary parts of the 
off-diagonal matrix elements, such that p' = pe with 

e = (6»W,6iM,6»(*)) 

:= (pii,...,pdd;Repi,2,-.-,Repd-i,d;Impy2,---,Inipd_yd)- (5.3) 


The Frobenius distance is given by 

wpb, - ps^wi = \\ei - o^r+m - m 


(dl - d2)^G(di - 92), 


where G is the constant diagonal weight matrix G = Diag(ld, 2 • 'kd(d-i)/ 2 ^ 2 • 'kd(d-i)/ 2 )- However, 
this parametrisation does not take into account the prior information about the rank of the true 
state, and moreover, our key argument involves the even smaller family TZd,r of states. We will now 
focus on providing a local parametrisation of TZdr around p = UpoU* . With respect to the basis 
By, a state p' G TZd,r in the neighbourhood of p has the form 




p' = p + Aoff -I- (5 = 


0 A 

At 0 


0(||Af) 0 \ 

• (5.4) 

0 o(|!A||2) J 


where A is a matrix of free (complex) parameters, and the two 0(|jA|p) blocks are r x r and 
respectively {d — r) x (d — r) matrices whose elements scale quadratically in A near A = 0. The 
intuition behind this decomposition is that a small rotation of p produces off-diagonal blocks of the 
size of the “rotation angles” while the change in the diagonal blocks are only quadratic in those 
angles. Since we are interested in the asymptotic behaviour of estimators, the local approach is 
justified, and the leading contribution to the Frobenius distance comes from the off-diagonal blocks. 
More precisely, if pi,P 2 G Tld,r are in the neighbourhood of p then 


llpi - P2f = 2\\0l - + 2\\9l - 9ir + Oi\\94\ ||d2r), 

where d”, d* are the real and imaginary parts of the off-diagonal elements contained in the block A, 
i.e. for i < r < j. Locally, the manifold TZd,r can be parametrised by 9 := {9'', d®). 


Since TZd,r C Sd^r the maximum risk for the model consisting of rank-r states is bounded from below 
by that of the (smaller) rotation model Tld,r 

inf sup Ep||p„-p ||2 > inf sup Ep||p„-p ||2 (5.5) 

pGSd.r PdTZd.r- 

Let TT be the “uniform” distribution over TZd,r- To draw a sample from this distribution, one can 
choose a random unitary U from the Haar measure over unitaries, and defines p := UpoU*. Then 
the maximum risk is bounded from below by the Bayes risk 

sup Ep\\p„-p\\l> [ Ep\\p^- p\\lTT{dp) (5.6) 

pe’^d.r- Jnd,,. 
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By applying the van Trees inequality in [321 |37j) we get that 



p\\lT^{dp) > 


1 

n 


Tr(G(p)J-i(p)V(dp)-4, 


(5.7) 


where a > 0 is a constant which does not depend on n. Here, I{p) is the (classical) Fisher information 
matrix of the the data obtained by performing one measurement for each setting, and G{p) is the 
weight matrix corresponding to the quadratic approximation of the Frobenius distance around p. 
Both matrices are of dimensions 2r{d— r) = dim(77,d,j.), and depend on the chosen parametrisation, 
but the trace is independent of it. Inserting ( |5.6| ) and ( |5.7| ) into ( |5.5| ), we get 

inf sup lVEp||p„ - pII^ > — /" Tr(G(p)^/^/(p)"^G(p)^/^)7r(dp) - 

Since t >—>■ is an operator convex function we have 

J G{py^‘^i{p)-^G{py^^TT{dp) > (^J G{p)-^/‘^i{p)G{p)-^/‘^TT{dp)j 


and by taking the limit n —^ oo we obtain the asymptotic minimax lower bound 


Rminmaxir, k) >S^Tt G{p) ^/‘^i{p)G{p) ^/^7r(dp) 


(5.8) 


where R 


minmax 


(r, fc; n) is the minimax risk defined in equation (5.1) 


At this point we choose a convenient local parametrisation around an arbitrary state p S IZd^r- As 
discussed in the beginning of the proof we showed that for this we can use the real and imaginary 
parts 9 = (0’’,0®) of the off-diagonal block A, and that the corresponding weight matrix is G{p) = 


2l2r{d-r)- The lower bound (5.8) becomes 


R. 


minmax 


(r, /c) > 3'= • 2 



i{p)TT{dp) 


Another consequence of (5.4) is that the Fisher information matrix I{p) is equal to the corresponding 
block of the Fisher information matrix I of the full (d^-dimensional) unconstrained model with 
parametrisation 9 defined in (5.3). We will now compute the average over states of the Fisher 


information with respect to the Pauli bases measurements, by showing that it is equal to the average 
Fisher information at po, for the random basis measurement. As the different settings are measured 
independently, the Fisher information I{p) is (and similarly for I) 

Hp) = 

sCiSk 


where 7(p|s) is the Fisher information corresponding to the von Neumann measurement with respect 
to the ONB defined by setting s. More generally, with B;/ as defined above, we denote by I{p\Bu) 
the Fisher information corresponding to this basis. Due to the rotation symmetry, we have 

iiUpU*]Bu) = i{p\B) 


so 


TT{dp)i{p) 


^ I n{dp)iip\s) = ^ I p^{dU)i{UpoU*\s) 
3 '^y’ p{dU)i{po\Bu) = 3>^I. 
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where fj,‘^{dU) is the unique Haar measure on the unitary group on C'^. 

The average Fisher information matrix / (of the full, unconstrained model) is computed in section 
where we show that the block corresponding to 0 parameters has average I = ‘2^-2r{d-r) such 


8.6 


that the lower bound is 


R. 


minmax 


(r, k) > 2r{d — r). 


6 Numerical results 


In this section we present the results of a simulation study which analyses the performance of the 
proposed estimation methods. The penalised and physical estimators discussed in the previous 
sections use a theoretical penalisation and respective threshold rates proportional to . However in 
practice we found that the performance of the estimators can be further improved when the rates 
are adjusted by multiplying with an appropriate constant c - whose choice is informed by the data 
- from a grid over a small interval which was chosen to be [0, 3]. The last two estimators are such 
versions of the theoretical ones with constant c chosen by using cross-validation methods which are 
explained in detail in section 6.1 We will compare the following 5 estimators described below. 


1. the least squares estimator defined in (3.11. 


2. the oracle “estimator” defined below. This is strictly speaking not an estimator 

since it requires the knowledge of the state p itself, and can be computed only in simulation 
studies. However, the oracle is a useful benchmark for evaluating the performance of the other 
estimators. 

3. the cross-validated projection estimator Here we try to find the optimal truncation rank 

of the least squares estimator, by using the cross-validation method. 

4. the cross-validated penalised estimator This is a modification of the penalised es¬ 


timator defined in (4.2), where the value of the penalisation constant is adjusted by 

cross-validation. 

5. the cross-validated physical threshold estimator This is a modification of the phys¬ 
ical estimator defined in (4.5), where the value of the threshold constant is adjusted by 

cross-validation. 


We explore the estimators’ behaviour by simulating datasets from states with different ranks, and 
with different number of measurement repetitions per setting. The methodology is described in 
detail below. 


6.1 Generation of random states and simulation of datasets 

In order to generate a density matrix of rank r, we first create a rank r upper triangular matrix T 
in which 

i) the off-diagonal elements of the first r rows are random complex numbers, 

ii) the diagonal elements T 22 ,..., Tj-r are real, positive random numbers, 

iii) all elements of the rows r -|- 1,..., c? are zero. 
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The matrix T is completed by setting Tn such that Tn > 0, and llTjlf = 1. If these conditions 
cannot be satisfied we repeat the procedure by generating a new set of matrix elements for T. When 
successful, we set p := T*T which by construction is a density matrix of rank r. We note that it 
is not our purpose to generate matrices from a particular “uniform” ensemble, but merely to have 
a state with reasonably random eigenvectors, and whose r eigenvalues are not significantly smaller 
than l/r. Following this procedure we have generated 4 states of 4 ions (d = 2^) with ranks 1, 2,6,10. 
The rank 6 state for instance, has non-zero eigenvalues (0.47,0.19,0.12,0.11,0.07,0.04). 

For each state, we have then simulated a number of 100 independent datasets with a given number 
of repetitions chosen from the range 20,100, 500, and 2500. In this way we can study the dependence 
of the MSE of each estimator on state (or rank) and number of repetitions. 


6.2 Computation of estimators 

We conducted the following simulation study for all the possible combinations between the states 
and the total number of cycles (i.e. 4 x 4 = 16 different scenarios). Below, we denote by r the rank 
of the “true” state p, from which the data has been generated. The procedure has the following 
steps: 


1. For a given number of repetitions n, we simulate 5 independent datasets Di,..., D^, each 
with n/5 repetitions. By simply adding the number of counts for each setting and outcome, 
we obtain a dataset D oi n repetitions. However, as we will see below, having 5 separate 
“smaller” datasets is important for the purpose of applying cross-validation. Note that such a 
procedure can be easily implemented in an experimental setting. 

2. We compute the least square estimator based on the full dataset D with total number of 
cycles n. 


3. We compute the oracle “estimator” as follows: 


(a) we compute the spectral decomposition (4.11 of with the eigenvalues \i arranged in 


decreasing order of their absolute values. For each rank 1 < k < d we define the truncated 
(least squares) matrix 


p«(k) = 


i=l 


(b) we then evaluate the norm-two (Frobenius) distance e(K) := ||p — Pn ('«)||2 define the 
oracle estimator as the truncated estimator with minimal norm two error 

^oracle) ^ kq = argmine(«:). 

Note that the oracle estimator relies on the knowledge of the true state p which is not 
available in a real data set-up. It is nevertheless useful as a benchmark for judging the 
performance of other estimators in simulation studies. At the next point we define the 
cross-validation estimator which tries to find the “optimal” rank kq by replacing the 
unknown state p with the least squares estimator computed on a separate batch of data. 


4. We compute the cross validation estimator as follows. 

(a) For each j € {1,..., 5} we compute the following estimators. While holding the batch Dj 
out, we compute the least squares estimator for the dataset consisting of joining 

the remaining 4 batches together. Similarly to the point above, we define the rank n 
truncation of this estimator by pn--j{K). We also compute the least squares estimator for 
the remaining batch j, denoted by ■ 
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(b) For each rank k we evaluate the “empirical discrepancy' 


CV(k) 


1 

5 




2 

2 


Since pn--j{K) and are independent, and the least squares estimator is unbiased 
) = Pi the expected value of CV(fc) is 


Tr 

= E_i [Tr (p„;_i(k)^)] - 2Tr (E_i [p„;_i(k)] p) + Tr(p^) + C 

= E_l[||p„;_l(«)-pf]+C 


E [CV(/c)] = E_i [Tr (p„;_i(«;)^)] - 2Tr (e^i [;o„;_i(/c)] • Ei 


Pn:l 


' El 


where we denoted E_i and Ei the expectation over all batches except the first, and 
respectively over the first batch. Therefore, the average of CV(k) is equal to the mean 
square error of the truncated estimator p„;_i(k), up to a constant C which is independent 
of K. 

(c) Based on the above observation we use the the cross-validation method as a proxy for the 
oracle estimator. Concretely, we minimize CV(k) with respect to k 


kcv = argmin CV(«;), 

k 


and define the cross-validation estimator as the truncation to rank kcv of the full data 
least squares estimator := Pnikcv)- 

5. We compute the cross-validated rank-penalised estimator as follows. 

(a) Let c be a penalisation constant chosen from a suitable set of discrete values in the interval 

[0,3]. Similarly to the cross-validation procedure, we hold out batch j, and we compute 
the rank-penalised estimator ( |4.2[ ), with penalty constant cv^ for j = 1,..., 5. We denote 
these estimators by We will also need the least square estimator for batch 

j computed above. 

(b) For each value of c we evaluate the empirical discrepancy 


CV(c) 


1 

5 


5 

\\Pn--i 

i=l 


(c) -P 


{Is) 

n\3 


2 

2 


and minimize CV(c) with respect to the constant c 


c = argmin CV(c). 

c 


Finally we compute the cross-validated rank penalised estimator which is defined 


as in (4.2), with constant cP^, on the whole dataset D. 


6. We compute the physical estimator as follows. 

(a) As above we choose a constant c from a grid over the interval [0,3]. We hold out batch 
j, and we compute the physical threshold estimator (4.51, using the algorithm below this 


equation, with threshold c • Av. We denote the resulting estimators by for 

j = 1,...,5. 


16 












(b) For each value of c we evaluate the empirical discrepancy 




Z=1 


We then minimize CV(c) with respect to the constant c. 


2 

2 


c = argmin CV(c) 

c 


Finally we compute the cross-validated physical estimator which is defined as 

in (4.51, with constant c • 4i/. 


6.3 Simulation results 

We collect here the results of the simulation study described in the previous section. As a figure 
of merit we focus on the mean square error E((||p„ — pjjf) of each estimator, which is estimated 
by averaging the square errors over the 100 independent repetitions of the procedure. We are also 
interested in how the different methods perform relative to each other, and whether the selected 
rank is consistent, i.e. it concentrates on the rank of the true state for large number of repetitions. 


Rank 1 


Rank 2 



a) MSEs for a rank 1 state 


b) MSEs for a rank 2 state 


Rank 6 



Rank 10 



c) MSEs for a rank 6 state 


d) MSEs for a rank 10 state 


Figure 3: Boxplots for the estimated mean square error (E|1;0„ — plU) 
for different ranks, k = 4 {d = 16), with n = 20 repetitions 


17 








































































































The four panels in Figure represent the boxplots of the square errors ||p„ — plU for the different 
estimators, and different states, when the number of repetitions is n = 20. Similarly, Figure]^ shows 
the same boxplots at n = 100. As expected, in both cases the least squares performs significantly 
worse than the other estimators, and the discrepancy is larger for small rank states. The remaining 
4 estimators have similar MSE’s with the physical one performing slightly better than the rest, 
followed by the oracle. Note also that the estimators’ variances (indicated by the size of the boxes) 
are larger for the least squares than the other estimators. A similar behaviour has been observed 
for n = 500, 2500 repetitions. 


Rank 1 Rank 2 



a) MSEs for a rank 1 state b) MSEs for a rank 2 state 


Rank 6 


Rank 10 




c) MSEs for a rank 6 state 


d) MSEs for a rank 10 state 


Eigure 4: Boxplots for the estimated mean square error (E|l;d„ — p|||) 
for different ranks, k = 4 {d = 16), with n = 100 repetitions 


Figure [^illustrates the dependence of the MSE of a given estimator, as a function of n, for the four 
different states which have been analysed. Since the MSE decreases as n~^ we have chosen to plot 
the “renormalised” MSE given by n ■ E|lp„ — plj^, which converges to a constant value for large n. 
As expected the limiting value increases with the rank of the state, as a proxy for the number of 
parameters to be estimated. 
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Figure 5: Renormalised MSEs n ■ EUpVi — p \\2 as a function of the number of repetitions for states 
with different ranks: 1 (black), 2 (red), 6 (green), 10 (blue) 


The histograms in Figure show the probability distributions for the chosen rank of each given 
estimator, as a function of the number of measurement repetitions n, for the state of rank 6. We 
note that in all cases the proportion of times that the chosen rank is equal to the true rank of the state 
increases as with the number of repetitions. However, this convergence towards a “rank-consistent” 
estimator is rather slow, as the proportion surpasses 80% only when n = 2500. Another observation 
is that the penalty and threshold estimators appear to have different behaviours: the former tends 
to underestimate the true rank, while the latter tends to overestimate it. As expected, the oracle 
estimator is more likely to choose the correct rank for large number of repetitions. Perhaps slightly 
more surprising, for small number of repetitions (n = 20), the oracle choose a pure state in most 
cases. 
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a) Oracle estimator 


b) Cross-validation estimator 



500 Cycles 


2500 Cycles 



500 Cycles 2500 Cycles 



Rank Rank 


c) Penalty estimator 


d) Positive estimator 


Figure 6: Histograms of the empirical frequencies of the chosen rank for different estimators, as 
function of the number of repetitions n, true rank r = 6, fc = 4 (d = 16) 


7 Conclusions and outlook 

Quantum state tomography, and in particular multiple ions tomography is an important enabling 
component of quantum engineering experiments. Since full quantum tomography becomes unfeasible 
for large dimensional systems, it is useful to identify lower dimensional models with good approxi¬ 
mation properties for physically relevant states, and to develop estimation methods tailored for such 
models. In particular, quantum states created in the lab are often very well approximated by low 
rank density matrices, which are characterised by a number of parameter which is linear rather than 
quadratic in the space dimension. 

In this work we analysed several estimation algorithms targeted at estimating low rank states in 
multiple ions tomography. The procedure consists in computing the least squares estimator, which 
is then diagonalised, truncated to an appropriate smaller rank by setting eigenvalues below a “noise 
threshold” to zero, and normalised. Among the several truncation methods proposed, the best per¬ 
forming one is the “physical estimator”; this chooses the density matrix whose non-zero eigenvalues 
are above a certain threshold and is the closest to the least squares estimator. We proved concentra¬ 
tion bounds and upper bounds for the mean square error of the penalised and physical estimators, 
as well as a lower bound for the asymptotic minimax rate for multiple ions tomography. The re¬ 
sults show that the proposed methods have an optimal dependence on rank and dimension, up to a 
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logarithmic factor in dimension. In addition, the algorithms are easy to implement numerically and 
their computational complexity is determined by that of the least squares estimator. 

An interesting future direction is to extend the spectral thresholding methodology to a measurement 
setup where a smaller number of settings is measured, which is however sufficient to identify the 
unknown low rank state. Another direction involves the construction of confidence intervals / regions 
for such estimators, beyond the concentration bounds established here. 

Acknowledgements. M.G.’s work was supported by the EPSRC Grant No. EP/J009776/1. 


8 Appendix 

8.1 Lemma on A* ■ A 


Lemma 1. Let A be the linear map defined in equation (2.5). Then A* ■ A is diagonal and 

[A* • A]b.b = for all b G {/, :r, y, z}\ 

Proof. Next, we give here for the reader’s convenience the proof of Lemma similar to that in [S]. 
Let us compute 

[A-A]t,b.=i:E Ab(o|s)Ab'(o|s). 

s o 

If b = b' it is easy to see that 


[A* . A]b.b = E E n 

S O j^Eb 


If b 7 ^ b', we have either Eb = Eb' or E^, E-^i. On the one hand, in case the sets E^, and E^,/ are 

equal, we have 

Ab(o|s)Ab'(o|s) = Y[ I{sj = bj) ■ /(sj = 6'). 

jiEb 

For each s, the previous product is 0. Indeed, if different from 0 then bj = b'j for all j not in Ey^. As 
bj = bj! = I for j in Ey,, it implies that b = b' which contradicts the assumption here. 

On the other hand, if the sets Ey, and Ey,/ are different, there exists at least one coordinate jo in the 
symmetric difference EyAEyi> and the sum over outcomes o will split over values of o where Ojg = 1 
and values where Ojg = — 1: 

E A(o|s)Ab'(o|s) 

O 

= E n n 

o:Ojo=l ji^Eb l^Eb, 

- E n ■ n 

l^Ebi 

We assumed here that jo belongs to Ey,\Ey,i and the same holds for jo in Ey,i\Ey,. □ 

8.2 Proof of Proposition 

Proof of Proposition^ Note that /(o|s) = ^ = o), where the random variables Xgj are 

independent for all settings s and all i from 1 to n. To estimate the risk of the linear estimator we 
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write 




“ p(o|s)) 

bos 


Ab(o|s) 

2fe3d(b) 


mm ^ =o) - p(ois)) 

bos 2 


y4b(o|s) 
2 fe 3 d(b) ^b 


EE^-- 


where V14,i are independent and centered Hermitian random matrices. We will apply the following 
extension of the Bernstein matrix inequality [38] due to |39|, see also |3T|, [32] . 

Proposition 3 (Bernstein inequality, |39jl. LetYi, ...,Yn be independent, centered, mxm Hermitian 
random matrices. Suppose that, for some constants V, W > 0 we have ||y, || < V, for all j from 1 
to n, and that || E(Yj?^)|| < W. Then, for all t > 0, 

P(||n + ... + r„|| > t) < 2mexp ■ 


In our setup we bound ||Ws,i|l — ^ for ah s and i and |j ^ where V, W are 

evaluated below. We have 


II< -EE 


b o 


Ab(o|s) 

2fe3d(b) 


= o) -p(o|s)| • llcTbl 


^ -E 


1 


n ^ 2'=3'^('’) 

b 




^ 2 A ^ 1 __ 2 A 

~ n2^ 3 ^ n2^ 

t=0 b:d(b)=l 1=0 


£) 3^ n2>^ 


2 /2 


n V 3 


:= V. 


Let us denote i?(o|s) := 2 ^3 ‘^^’^Mb(o|s)(Tb- Then 

iiEE 

s i 

= ^nEEE B*(ols) ■ Cov(/(Xs,, = o),/(X,,, = o')) • B(o'|s)|| 

s i o,o' 

^ ;^llEEE^*Hs)-i?(o|s)||. (8.1) 


The last inequality follows from the fact that the covariance matrix can be written as the difference 
of two positive matrices Covo,o' := Gov{I{Xsd = o),I{Xs,i = o')) = p{o\s)So,o' — p(o|s)p(o'|s) and 
since p(o|s) < 1, we get Cov < 1. 

We replace i?(o|s) in (|8.1|) and use Lemma[^to get 


< 


iiEE E(WTWs.*)|| 

s i 

I EEE 22 fc 3 d(b)+d(bO -^b(o|s)A'(o|s) -abCTb^ 

s o b.b' 


< 


— E E - 

n2k 3^ 

e=o b-.d{h)=e 



1 

n 



:= W. 
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Apply the matrix Bernstein inequality in Proposition to get, for any t > 0: 


<2'=+iexp 


n / 3 


1 + 2t/3 2 V2 


We choose t > 0 such that 


e = 2*^+^ 


exp 


n / 3 


l + 2t/3 2 V2 


which leads to t such that 


= -(h log 


l + 2t/3 n V3 


2fc+ 




Then the convenient choice of t is i/(e) = v{e)/2 + ^v^{e)l2 + 3/2 • v{e), which is equivalent to 
y/ 3/2 • v{e) when this last term tends to 0. □ 

8.3 Proof of Proposition 

Proof of Proposition^ Let us denote by (Ai > ... > Ad) the eigenvalues of and by Ai,...,Ad 

the eigenvalues of the resulting estimator pi**’"!. It is easy to see that the latter has the same 
eigenvectors as Therefore 

d 

(Ai,...,Ad)= arg min ^(A^-- A^)^. 

This optimisation has the explicit solution 


~ L L ^ ^ 

^3 = ^3 + T ’ 


Note that 


i=i 


^ = itr(p-^„'^))< -p||. 


Therefore, Hpi**’"! — = L/2 < — p\\ and thus 


□ 


8.4 Proof of Corollary 

Proof of Corollary [7J We order the eigenvalues of in decreasing order of absolute values (| Ai | > 

... > I Adi), and denote by Ai,..., Ad the eigenvalues of As in the previous proof, we can see 

that both matrices will share the same eigenvectors and that the relation between the eigenvalues is 

~ L L ^ ^ ^ 

^3 =^ 3 + 2 ^ 2 ^ ^ ^ ~ ^ 

i=i 

Thus, 

ll^^pen) _ ^ ^(^/2)2 < Tr^)^/™) - p)/d < - p|||. 


23 








We deduce that |!/o- - p\\f < - p \\ f - Therefore, 

P(l|p<r”’”)-p||F>8c(0Me)2) <e 

and the previous inequality remains true for all 0 < e < e < 1. Let us denote by 

j'fj 0 f1 

x{e) = 9,rv‘^{e) = 16— log( —), 

N e 

which is a decreasing function of e. This implies that e = 2(iexp(—Thus, 


pOO 

Ep||^~)-p||2 = / P,(||^~)-p||2>x)drr 


pOO 

P,(||p<™) - P\\l > ^)dx + / Ppdl^r”’”^ - Pll2 > ^)dx 

J x(e) 

Nx 

{e)+ / 2dexp(--—)da; 

Jx( 6 ) Ifird 


C(£) 

rd ,2d^ rd , N , 

< 16—log( —) + 2d-16—exp(-—x(£)) 


, rd 


16rd 

rd 

'N 


N 


,2d 


< 16— log( —)+£ <(7—log( —). 


,rd, ,2d. 


iV 


□ 


8.5 Proof upper bound physical estimator 


Proof of Theorem^ We recall from Propositionthat with probability larger than 1 — £, we have 
p|l < 2 f(£). In particular, by using the Weyl inequality [40], this implies that |Afc —Afe| < 2 f(£) 
for all k from 1 to d, where Ai,..., A^^ are the eigenvalues of p arranged in decreasing order. 

After a total of £ = d — f iterations the algorithm stops and we have [41] 


Jfphys) _ _ ffphys) _ q 


and 


fiphys) 


:=y'' = y 


k>r 


j = l,...,r. 


From now on we assume that — p\\ < 2f(£) which holds with probability larger than 1 — £, cf. 

Corollary |3.2[ We will show that under this assumption f = r. For this we consider the two cases 
f > r and f < r separately. 


Suppose f > r. For j < f we have 


^fiphys) 






k<f 


— ~ f - Afc) 

k<r 

E \Xk — Xfz\ E “1“ — 4z/. 

k<.r 
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This implies that < Xf+Au = Av since f > r. However, as discussed above, by the definition 

of the threshold estimator, > Aiy. Therefore f cannot be larger than r. 

Suppose r < r. Then > Xr — — Xr\ > Sv — Ai> = Ai/. However this is not possible 

since = 0 under the assumption f < r. Thus, r = r with probability larger than 1 — e. 

We consider now the MSE of the estimator. As shown above, we have ~ ^ Let p = 

UDU* be the diagonalisation of p, where U is unitary and D is the diagonal matrix of eigenvalues. 
Similarly, let = UnDn‘^'^U* and be the decompositions of the least 

squares and the physical threshold estimator, where we take into account that the latter two have 
the same eigenbasis. Then 

\\^phys)_py = \\U^DiP'^y^W*-UDU*h 

< \\Ur,DiP'^y‘^'>U: - Ur,DU:h + WUuDU: - UDU^h 

< \\DiP^y^'>-Dh + \\UnDU:-UDU*\\2 ( 8 . 2 ) 

The first norm on the right-hand side of the previous inequality is bounded as 

_ £)||2 ^ _ Xjf < 16r^2_ 3 ) 

j<f 


For the second norm we use a similar triangle inequality for the operator norm 

\\u^Du:-uDu*\\ < + 

= p-i?('«)|| + ||pd«)-p|l<^ + ^ = 2^. 

The first term is smaller than v since |Ai — Af | < as it follows from Proposition and the Weyl 
inequality [40]. The second term is also bounded by v, by PropositionJ^ Therefore, since UnDU* 
and UDU* are rank r matrices, the difference is at most rank 2r and \\UnDU* — UDU*\\2 < 2v\f^. 
By plugging this together with (8.3) into the right side of (8.2) we get 

Il^rf""*^-Pll2<(2v^ + 4)W, 


with probability larger than 1 — e. 

Moreover, the previous inequality remains true for all 0 < e < e < 1. Let us denote by 

x{e) = A8ri^^{e) = 96— log( —)> 

which is a decreasing function of e. This implies that e = 2dexp(—Thus, 






p\\l > x)dx 
- p\\\ > x)dx + 



p\\l > x)dx 


< 

< 

< 


JO 


Jx{e) 


poo 

[;(£)-)- / 2dexp(— 
J x(£) 


Nx 


lx{s) 

^„rd, ,2d. „„rd , N , 

96— log( —) -b 2d ■ 96— exp(--—a;(e)) 


N 

,rd 


96rd 

rd 

fV 


)dx 


,2d 


96- (log(-)+£) ) <C-log(-). 




96rd 

,2d. 


N 


□ 
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8.6 The average Fisher information matrix for the full, unconstrained 
model, with random measurement design 


In this section we present a detailed calculation of the average quantum Fisher information at a 
the rank r state po defined in (5.21, for random measurements with uniform distribution over the 
measurement basis. We consider the full parametrisation by 0 € given by equation (5.3). As 
explained in the proof, the Fisher information matrix for the parametrisation 9 of the rotation model 
TZd,r is a particular block of the larger Fisher matrix computed here. We will come back to this at 
the end of the computation. 


Let 

Bu :={|o;[/) := C7|o) : o = 1,..., 2'=} 

denote the ONB obtained by rotating the standard basis by the unitary U. The Fisher info /(pjBy) 
associated to the measurement with ONB Bu is given by the matrix 


I{p\^u)a,b = 


y __ 


dpp{o\Bu) dpp{o\Bu) 


d9„. 




where Pp{o\Bu) is the probability of the outcome o 


Pp{o\Bu) 


= (o,17|p|o, [/) = y p*i|(o,{7|i)p 

i 

+ 2 y Re(p,j)Re((i|o, C7)(o, U\j)) + 2 y Im(p,,j)Im((i|o, C/)(o, U\j)) 

i<j i<j 


and the partial derivatives are given by 

dpp{o\Bu) 

dpip 

dpp{o\Bu) 

dKepij 

dpp{o\Bu) 


= \{o,U\^)\^ 

= 2Re((z|o,C7)(o,C/|j)) 
= 2Im((*|o,C/)(o,C7|j)). 


dlmpij 

The Fisher information matrix I{p\Bu) has the following block structure 

/ I‘^‘^{p\Bu) I‘^^{p\Bu) I'^^{p\Bu) \ 


I{p\Bu) = 


r'^iplBu) r^{p\Bu) /"(p|By) 

V PHpI^u) I"{p\Bu) P^pIBu) 


with superscripts indicating the type of parameter considered: diagonal, real or imaginary part of 
off-diagonal element. The average Fisher information for a randomly chosen basis is 


I{p) := / p{dU)I{p\Bu) 


where p{dU) is the Haar measure over unitaries used for choosing the random basis. Note that by 
symmetry / (p) only depends on the spectrum of p. We will not compute I (p) for an arbitrary state 
p but only at p = po. The corresponding Fisher information will be denoted I = I{po) and is a 
function of d and r. Below we compute the different blocks of /. For the matrix elements we will 
use a suggestive notation, e.g. denotes the element corresponding to the diagonal parameter 

= Pa and the real part of the off-diagonal element pjk, etc. 
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A) Diagonal-diagonal block. 




1 


,Pp(o|B£/) 


i(o,c/|z)n(o,[/|j)i^ 


o:pp(o|B(7)>0 " 

By integrating over unitaries we obtain the corresponding matrix element of the average Fisher 
matrix I. Since Ppo(o|B( 7 ) > 0 is true for all o, with probability one with respect to we 

drop the condition from the sum. At the state p = pq dehned in (|5.2[), we have 


fdd ^ Y 


= r-d 
= r-d 


{o,Um{o,U\3)\\,d 


|(i,c/K)n(i,c/|j)|2 , 






p'^idU) 






where h'‘^{d'ip) is the uniform measure over the projective space on C'^. To compute the integral we 
decompose IV') as 

W = q\tpi) + \J\ - (7^|^/’2) 

with I'f/'i) S dir '■= Span{|l),..., |r)} and |'!/' 2 ) G normalised (orthogonal) vectors, and 0 < g < 1. 
The uniform measure can be expressed as 

v^{d^) = m'^’'^[dq) X z/’'(dV'i) X i3‘^-^id3P2) 

where m'^’^{dq) is the distribution of the length of the projection of a random vector in onto an 
r-dimensional subspace. With this notation we have 


. = r ■ d 


1 




(8.4) 


We distinguish 4 sub-cases depending on whether each of the indices i,j belongs to {1,... ,r} or 


Sub-case 1: i,j < r. In this case (8.4) becomes 


jdd 


= r-dj ^q^m‘^’-{dq) X J {dU). 


(8.5) 


In the last line we have re-written as C/|l) in order to use the existing formulas for integrals of 
monomials over the unitary group |42j . Since we will use these formulas repeatedly, we recall that 




( 8 . 6 ) 


<7,r£S2 

where a and t are permutations in S 2 = {(1,1), (2)} and Wg*(-) is the Weingarten function over 32- 

Wg'(l,l) = ^^, Wg'(2)= 

The integral over q can be easily evaluated as 


i{p-iy 


q^m^^ydq)=Y [\{m\^pydy)="-. 

k=l d 


(8.7) 
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By inserting (8.6) and (8.7) into (8.5) we get 


jdd 


jdd 


r(r2-l)) r + 1’ * ^ 

r^/l 1\ 2r 


Sub-case 2: i <r and j > r. 


From (8.4) we get 


= ^-d 

[ lg"(l-9V’’'(rf9)x 


r r\ 1 1 

II 

1 , • • - 1. 


dJ r a — r 






Sub-case 3: i,j > r. Similarly, in this case 

^(l-g^)W’’'(dg)x J |(V’2|*)n(V'2|j)4''-’'(#2) 

= r-dj ^{1- q^fm'^’^dq) x J [/*,[/,,(8.8) 

We first simplify the integral on the right side 

I 1(1 - q^)V’^{dq) = J ^m^^^^idq) - 2 + ^. (8.9) 

To evaluate the remaining integral, we consider a multivariate Gaussian random variable c = 
(ci,..., C 2 d) ~ iV(0, l 2 d), and denote by g^'^{dc) its probability distribution. From this we construct 
the complex vector with uniform distribution over the unit ball in 




2d 


= El 


z=i 


We can now write 


lm‘'’"(d<?) = J /"(rfc) 


= 1 + 


y^2d 2 

=)fei44 = l+ ; 

E/=i cf J 

f 9^%dcy 


f 1 . 

/ ||c2p^^ 


Y^2d 2 

Z^l=2r-i-l H 

g^^idc^) 


= 1 + 2{d — r) ■ 


1 


d- 1 


( 8 . 10 ) 


2r — 2 r — 1 

Above we used the fact that ||c^||2 is a variable with 2r degrees of freedom, so the second integral 
is the mean of its inverse. By inserting (8.10) into (8.9) we obtain 


J ^(l-q^fm‘‘^-{dq) = -2+'^ + 


d—1 (d —r)(d —r + 1) 


r — 1 


d{r — 1) 


( 8 . 11 ) 


Finally, by inserting (8.11) into (8.8) and applying (8.6) we obtain 


jdd 


jdd 


rd 


r — 1 
2r 

r — 1 


(d — r)(d — r + 1) 
d(r — 1) 

, * 7^ 7 


(d —r)2 —1 (d — r)((d — r)2 — 1) 
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Note that for r = 1 these matrix elements are infinite. This is due to large contributions from 
measurements which have one basis vector close to being orthogonal to the one dimensional vector 
state.. This is somewhat akin to what happens in the case of a Bernoulli variable (coin toss), in the 
case when the probability is close to zero or one. While this phenomenon is interesting, it does not 
play any role in our analysis for which the diagonal matrix elements are not relevant parameters. 


B) Diagonal-Real block. 




1 


,Pp(o|By) 


|(o,C/|z)p.Re((j|o,C/)(o,C/|fc)) 


o:pp(o|Btj)>0 ' 

As before, the corresponding entry of the average Fisher information matrix is 


= ‘2-d-r 


= 2 ■ d ■ r 








Sub-case 1: i,j^ k < r. In this case the integral is 

= 2- d-r J q^m‘^’'^{dq) j \{i;i\i)\‘^Re{{j\'ipi){i^i\k))v^{d^i) 

= + UkiU^^) = 0 , 

where we applied formula (8.6) for the integrals over the unitaries. 

Sub-case 2: i,j < r and k > r. In this case the integral is 

■ j q\/l- q^m'^''^{dq) 


= ‘^■d-r 


X Re / \{'ipi\i)\^{j\i^i)iy^{d'ipi) / {-ip 2 \k)iy ’'(# 2 ) = 0. 


Sub-case 3: i < r and j, k > r. In this case the integral is 
dtik = 2-d-r [{1- 


X / \{i’i\i)\ 12 ^(dipi) ■ Re {j\')p 2 ){'>p 2 \k)iy ’'(dV' 2 ) = 0. 


Sub-case 4- 'I > c o,nd j, k < r. In this case the integral is 


= 2-d-r J il-q^)m‘^’^idq) 

X y l(V'2K)pi^'^"’'(#2) • Re J {j\'il}i){'il^i\k)v^idtlji) = 0. 

Sub-case 5: i > r and j, k < r. In this case the integral is 

dfr.jk = 2-d-r J(l- q^)m‘^’^{dq) 

X J |(V'2K)pi^'^"’'(dV'2) • Re J {j\'il)i){'il;i\k)v^{dtpi) = 0. 

Sub-case 6: i,k > r and j < r. In this case the integral is 

ifvjk = 2-d-r J q^/l-q‘^m^’^{dq) 

X Re j\{ip2\i)\'^{ip2\k)i''^~'^id-tfj2) ■ j{j\ipi)k''^idilJi) = 0 . 
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Sub-case 7; i,j,k > r. In this case the integral is 

X y l(V'2|i)pRe(j|V'2)(V'2|A:)i/‘^"’'((i-02) = 0. 

The last integral is zero for the same reason as in sub-case 1. 

In conclusion all diagonal-real matrix elements are zero 

dii-j,k = 0; for all i = 1,..., 2^, and 1 < j < k <2^. 


C) Diagonal-Imaginary block. 

Similarly to the case of diagonal-real elements, we obtain that all diagonal-imaginary matrix elements 
are zero 

dfkjk = 0; fo^ all i = 1,, 2^, and 1 < j < fc < 2^. 


D) Real-Real block. 

I{p\Bu)l^.,u = 4 


1 


E . ^ Re((z|o, U) (o, U\j)) ■ Re((fc|o, t/)(o, U\l)) 

^ Pp[o\Bu) 


o:pp(o|B[/)>0 

As before the corresponding entry of the average Fisher information matrix is 


ii;-ki = 4-dT 


= 4 • d • r 


Re((i|V')(V’|j)) ■ Be[{k\'ijj){'ij}\l)) ^ 

ELiKV'lfc)!^ 

^Re((i|V’)(iA|j)) ■ Be{{k\'il)){'ilj\l))rn‘^''^{dq)v''{d'il)i)v'^~^{d'il}2)- 


The same reasoning as before can be applied to transform the integral into a single integral, or a 
product of integrals over unitaries. If a single index is larger than r while the other 3 are smaller, or 
conversely a single index is smaller than r while the other are larger, then we have two integrals over 
unitaries which are zero since the monomial is of odd order. Therefore the following cases remain 
to be analysed. 

Sub-case 1: i,j, k,l < r. In this case the integral is 

dlvM = 4:-d-r [ q'^m^’''{dq) /Re((i|V'i)(V'i|j)) ' Re((fc|V'i)(V'ilO)'^’’Wi) 


= d ’ r ■ - ■ 

d 


p^idum.u^^ + c/,it/r,)(c/fcic/r, + unu^^)- 


Using formula (8.6) we find that the integral over unitaries is zero unless we deal with a diagonal 
matrix element of /, i.e. i = k and j = 1. For the latter we have 


jrr ^ d-r-- 


= 2 ■ d ■ r 


p^{dU)iUaU*^^ + C/,iC/rj(C/,iC/iT + UjiUl,) 


2r 


-1 r(r2-l)y r-\-l 


Sub-case 2: i,j, k,l > r. In this case, the integral is 


rr 


A-d-r J ^ m'^’'^{dq) j Re((i|V'2)(V'2b')) • Re((fc|V’2)(t/’2|0)^^‘^ '^('^V' 2 ) 

^ + C/,iC/i* )(C/feiC/r, + UnU^k)- 
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where we have used formula (8.11) for the integral over q. A similar calculation as above shows that 
all off-diagonal elements are zero and 


rrr ^ r - r){d - r + 1) ( 1 _1_ 

d(r-l) \{d-rf-l (d-r)((d-r)2-l) 

2r 

r — 1 

Sub-case 3: (i,j < r and k,l > r) or (i, j > r and k,l < r). In this case we deal with a product of 
integrals over unitaries of the form 

J fi^dU)UaU*^ = 0. 


so all matrix elements are zero. 

Sub-case 4- i,k < r and j,l > r. Again, the off-diagonal elements are zero and 


^m3= = 4:-d-r 

k^{dq) J 

Re((f|V’i)(V' 2 |j)) • Re{{i\ipi){tp 2 \j))’^"'{d^i)’^'"{dtl} 2 ) 

r, 1 

1 


- “ ^ d^-rd 

= 2. 

— r 


In conclusion, the only non-zero elements of the real-real block are on the diagonal and are given by 


2r 



r -1- 1 ’ 

1 < i < j < r, 

jrr _ 

2, 

i < r and r < j <2^, 


2r 



r — 1 ’ 

r <i < j <2^. 

E) Imaginary-Imaginary block. This 
are zero, and the diagonal ones are 

block is 

similar to the real-real one. All off diagonal elements 


2r 



r -1- 1 ’ 

1 < f < j < r, 

jii _ 

2, 

i < r and r < j <2^, 


2r 



r — 1 ’ 

r <i < j <2^. 


F) Real-Imaginary block. Next we show that all real-imaginary off-diagonal elements are equal to 
zero. 


/(p|B 


U)ij;kl 


= -4* 


E 


o:pp(o|Btj)>0 


Pp(o|B[/) 


Re((i|o, C/)(o, U\j))lm{{k\o, U){o, U\l)) 


By integration we obtain the corresponding matrix element of the average Fisher information matrix 


jrr 


— 4 ■d-r 
—4 ■d■r 


J n=iMkW 


Again, if one index is smaller that r while the other three are larger, or otherwise, then the matrix 
element in zero. We analyse the remaining cases. 
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Sub-case 1: i,j, k,l < r. In this case the integral is 




-A-d-r J q'^m^’^idq) J Re((i|^/>i)(^/>i|j))Im((fc|'0i)(^/ii|Z))i/’'(d'0i) 

z • d • r • ^ y ^l^{dU){U,lU:^ + UjiU^,){UkiU:i - UnU^f,). 


Using formula (8.6) we find that the integral over unitaries is zero unless i = k and j = I- However 
even in this case, two of the four terms are zero, and the other two cancel each other. 


Sub-case 2: i,j,k,l > r. Here the integrals over the unitaries are similar as in case 1 above, with 
the difference that they taken with respect to fi‘^~'^{dU). Therefore the matrix elements are zero. 

Sub-case 3: (i,j < r and k,l > r) or (i,j > r,k,l < r). In this case we deal with a product of 
integrals over unitaries of the form 


J fi'^{dU)UaU*^=0. 

so all matrix elements are zero. 


Sub-case 4- i,k < r and j,l > r Again, the off-diagonal elements are zero and 

= = -4 • d • r y(l - q^)m^''^{dq) J Re((i|V'i)(V'2|j)) • Im((i|V'i)(V'2|i))j^’'(dV'i)iz’'(dV'2) 

= *.dT.(l-^)y I 
= 0 . 


In the last integral, two terms are zero and two have different signs and cancel each other. In 
conclusion all elements of the real-imaginary block are equal to zero. 


Summary of the computation of I. We found that all off-diagonal blocks of I are zero 

Jdr _ jdi _ jri _ Jrd _ jid _ Jdr _ q 


Moreover the real and imaginary diagonal blocks are diagonal, equal to each other and have three 
distinct values depending on the position of the indices i < j with respect to r: 


{ Trr/ii _ 2r 

r+1’ 

= 2, 

Trr/ii _ 2r 

r-1 


Finally, the d x d block is not diagonal but has 


1 < z, j < r 

1 < z < r, and r < j <2^ 
r <i < j <2^. 

a simple form 


jdd 

— 2r 

1 < z < r 


r+1 ’ 


jdd 

2r 

r-1 ’ 

r < i <2^ 

jdd 

r 

— ^.+ 1? 

i,j < r, and i ^ j 

jdd 

r 

— ^.+ 1? 

1 < i < r and r < j <2 

jdd 

r 

— ^.+ 1? 

1 < i < "c and r < i < 2 

jdd 

_ r 

r— 1 

r < i,j, and i ^ j. 


The model TZd,r can be seen (locally) as the restriction of the full unconstrained model Sr,d parametrised 
by d, to the subset of parameters 9 which are real and imaginary parts of matrix elements pij with 
i < r < j. Therefore the corresponding average Fisher information matrix is equal to the corre¬ 
sponding block of /, i.e. I = 2l2r(d-r)- 

□ 
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